Spark Workflow
Typical Spark workflow that consists of ingestion, processing, storage and analytics –
- Ingests data from source
- HDFS, NoSQL, S3, real time sources, etc.
- Transforms Data
- Filter, Clean, Join, Enhance
- Persists processed data
- Memory, HDFS, NoSQL
- Interactive Analytics
- Shells, Spark SQL, third-party tools
- Machine Learning
- Action
All these tasks in the workflow are explained in detail in later sections